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ABSTRACT 

Using population synthesis tools we create a synthetic Kepler Input Catalogue (KIC) 
and subject it to the Kepler Stellar Classification Program (SCP) method for deter- 
mining stellar parameters such as the effective temperature T e ff and surface gravity 
g. We achieve a satisfactory match between the synthetic KIC and the real KIC in 
the logg vs logT e // diagram, while there is a significant difference between the actual 
physical stellar parameters and those derived by the SCP of the stars in the synthetic 
sample. We find a median difference AT e ff = +500 K and ~ Alogg = —0.2 dex 
for main-sequence stars, and ~ AT e yy = +50 K and Alogg = —0.5 dex for giants, 
although there is a large variation across parameter space. For a MS star the median 
difference in g would equate to a ^ 3% increase in stellar radius and a consequent 
3% overestimate of the radius for any transiting exoplanet. We find no significant dif- 
ference between AT e jj and Alogg for single stars and the primary star in a binary 
system. We also re-created the Kepler target selection method and found that the 
binary fraction is unchanged by the target selection. Binaries are selected in similar 
proportions to single star systems; the fraction of MS dwarfs in the sample increases 
from about 75% to 80%, and the giant star fraction decreases from 25% to 20%. 

Key words: binaries: general - Galaxy: stellar content - planetary systems - stars: 
evolution - stars: statistics - surveys 



1 INTRODUCTION 



The NASA Kepler mission (Bo rucki et al||2010 ) is designed 
to detect transiting exo-earths in habitable zones around 
solar-like stars. To achieve this goal Kepler is monitoring 
about 150,000 stars for 3 or more years. The target stars 
were selected from a larger list, the Kepler Input Catalogue 
(KIC), according to a set of criteria that rank stars in order 
of the likelihood to display detectable transits of exo-earths 
in the habitable zone dBatalha et al.|2010| ) . The KIC covers 
the 116 square degrees of the Kepler field ( Koch et al|2 010) 
and contains about 450,000 stars with magnitude brighter 
than K p = 16 (where K p is the magnitude in the Kepler 
band). This catalogue was established to derive physical pa- 
rameters for objects in Kepler's field of view and to allow 
the selection of a set of optimal targets that would maximise 
Kepler's chance of detecting an Earth-sized transit around 
a Sun- like star ( |Brown et al.|2011| ). The KIC itself was com- 
piled from a ground-based survey using broad-band Sloan 
Digital Sky Survey (SDSS) filters with a flux precision of 
2%. 

Kepler's Stellar Classification Program (SCP) ( |Brown 
et al. 2011) derived basic physical parameters of all KIC 
stars, chiefly the effective temperature T e //, surface gravity 
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g, and metallicity Z, and, by comparison with suitable stel- 
lar models, the stellar mass, radius and age, using only the 
observed broad-band magnitudes and colours of these stars 
as an input. The target selection in turn is based on these 
S CP-derived stellar parameters. 

These SCP-derived parameters may suffer from random 
and systematic uncertainties introduced because the mea- 
sured magnitudes of a star may differ from its true, intrinsic 
magnitudes, and because colours alone will not always un- 
ambiguously deliver appropriate estimates of the physical 
parameters. This will in turn translate into a bias of the 
statistical properties of samples drawn from Kepler data, 
including the exoplanet candidate sample itself, or the sam- 
ple of binary stars with Kepler light curves. We note that 
stellar parameters are also needed to derive the properties 



of any transiting planet that is detected (Torres, Winn & 
Holman 2008), but for confirmed planets the SCP parame- 



ters are unlikely to be the sole or main source for the stellar 
parameters. 

It is therefore important to critically examine the per- 
formance of the SCP approach, and the consequences of any 
inherent systematic bias for the actual Kepler target list, and 
for subsamples created from Kepler data. To this end we aim 
to create a synthetic version of the KIC, obtained by pop- 
ulation synthesis calculations that include self-consistently 
evolved binary systems. We validate the population model 
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against the actual KIC in colour-magnitude space, and em- 
ploy the SCP technique to derive "apparent" stellar parame- 
ters for all stars in the synthetic sample, i.e. exclusively from 
their magnitudes in different colour bands. We then investi- 
gate the difference between the actual, physical parameters 
of our synthetic stars, and their SCP-derived parameters. 

Due to bandwidth limitations Kepler does not observe 
every object in the KIC, instead a target list is drawn up that 
aims to maximise the science return on the targets observed. 
This list is determined on the basis of the SCP-derived pa- 
rameters and the expected flux levels, aiming to increase the 
fraction of Sun-like stars and decrease the fraction of giants 
in the sample. In this study we wish to reproduce the target 
selection procedure and apply it on the synthetic sample of 
the KIC, to quantify the resulting bias against giants on the 
basis of the actual, physical parameters of the synthetic KIC 
stars. 

The paper is organised as follows. In Section [2] we de- 
scribe our population synthesis model, including updates we 
made to the existing BiSEPS code. Section [3] deals with the 
derivation of the Kepler target list and a comparison of our 
work to the real KIC. In Section [4] we compare the real 
physical parameters of our synthetic sample to those derived 
from the SCP. We then investigate the bias introduced by 
the target selection method. In Section [5] we discuss the sig- 
nificance of various assumptions made in our analysis, while 
in Section [6] we summarize our main findings. 



2 POPULATION SYNTHESIS MODEL 

To calculate a model for the stellar and binary star popula- 
tion in the Kepler fleld-of-view we added new input physics 
and functionality to the Binary and Stellar Evolution Pop- 
ulation Synthesis (BiSEPS) code which was originally de- 
sc ribed in jWillems fe Kolb| ( |2002[ |20Q4| ) and later employed 
by Willems et al. \ 2006| ) in a simplified way to study the false 
positive rate in the exoplanet transit search project Super- 



WASP (Pollacco et al 2006) from shallow-eclipsing binaries. 



BiSEPS in turn is based on the analytical descriptions of 



stellar and binary evolution by Hurley, Pols & Tout (2000) 



and |Hurley, Tout fe Polsj (2002 ) 



2.1 Binary evolution 

At the core of the population synthesis scheme is a large 
library of single star and binary system evolutionary tracks 
from the ZAMS up to a maximum age of 13 Gyrs, providing 
physical parameters for typically 100 time steps suitably dis- 
tributed along the tracks. The stellar evolution scheme takes 
into account mass loss via winds, Roche lobe overflow, and 
angular momentum losses due to gravitational wave radia- 



tion and magnetic braking (see Willems &; Kolb (2004) for 



references). A newly forming binary system is taken to be 
fully characterised by the initial masses of its components, 
the orbital separation, and the stars' chemical composition, 
set here with hydrogen abundance X = 0.70 and metallic- 
ity Z (we consider either Z = 0.020 and Z = 0.0033). All 
systems start with and are forced to have, during their evo- 
lution, circular orbits. 

The initial parameter space is divided into 50 logarith- 
mically spaced equidistant bins of initial masses Mi and M 2 



between 0.1 and 20Mq and into 250 logarithmically spaced 
equidistant bins of initial semi-major axes a between 3Rq 
and lO 6 i?0. By symmetry, only objects where Mi ^ M2 
are evolved. Single star tracks are obtained from the pri- 
mary star tracks in very wide, non-interacting binaries (with 
a = 10 7 R Q ). 



2.2 Galactic Model 

Underpinning the spatial distribution of the synthetic stars 
is a simple kinematic model of the Galaxy, described in detail 



Willems, Kolb &; Justham ( 2006 ) (and references therein) 



The Galaxy is assumed to comprise a young thin disc and an 
older thick disc. Each disc's stellar distribution is modelled 
as a double exponential of the form 



Q(R, z) = n exp ( -— — 1 exp ( — — 
\h R J \ h z 



(1) 



with fiR — 2.5 kpc and h z — 300 pc for the thin disc and 
Hr — 3.8 kpc, h z = 1 kpc for the thick disc . The integral 
is normalised to unity, thus n Q — l/Anh R h z . We assume 
that star formation proceeded for the first 3 Gyr after the 
formation of the Galaxy in the thick disc and then con- 
tinued until the current epoch (13 Gyr) in the thin disc. 
During the respective star forming periods the star forma- 
tion rate is taken to be constant in each disk, such that one 
star or binary with component mass M > 0.8Mq is pro- 
duced per year \ Weidemann|| 1 990| . To capture the essence 
of the metallicity evolution with Galactic age we go beyond 



Willems et al. ( 2006 ) and assume that thick disc stars have a 
metallicity Z = 0.0033 ( |Gihnore, Wyse fe Jones|1995| while 
stars forming in the thin disc have a solar metallicity value 
of Z = 0.020 flHaywood|2001| ). 

To obtain the total number of systems in a given survey 
field the stellar density as defined in |l]) is numerically inte- 
grated over Galactic longitude, latitude and distance (/, b, d) 
by translating it from galactocentric (R, z) to heliocentric 
coordinates (l,b,d) via 



2dR G cos 6 cos / + Rn) 2 



R — (d 2 cos z 

z = dsinb + zq, (2) 

where Rq — 8.5kpc is the radial distance of the Sun from 
the Galactic centre ( Reid|1993 ) and z ^ — 30pc is the h eight 
of the Sun above the Galactic plane (Ch en et al.|2001 ). 

For each system the integral over distance is carried 
out between the minimum and maximum distance, dmin and 
dmax, this system can be seen at. If the survey is magnitude- 
limited these are determined by d = iQ( m - M + 5 ~ A x)/ 5 where 
m is the lower or upper magnitude limit of the survey, M 
the absolute magnitude and A\ is the extinction along a line 
of sight at (/, b) in the filter band of the survey. 

The targe t magnitude range o f the Kepler mission is 
8 ^ K p ^ 16 (iBatalha et al. 2010), and it is this interval 



we used for computing the synthetic KIC. We extend our 
simulations to include stars down to K p = 19 so that we 
can estimate the background flux levels, and we model bright 
stars up to, arbitrarily, K p = 0, to include the few bright 
objects that will saturate the detector but not be observed 
as target objects. 

As the extinction itself depends on the distance, we cal- 
culate the distance limits for the integral iteratively. We up- 
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graded the BiSEPS extinction routine described in | Willems] 



the synthetic ATLAS9 spectra for stars between 3900K < 



et al.| ( |2006| ) to that of |Drimmel et al.| ( |2QQ3| ) which calculates T eff < 50000K and < log g < 5 ( |Castelli, Gratton fe Ku 



the Galactic extinction from a 3D dust model of the Galaxy 
that has been scaled using data from the COBE/DIRBE 
NIR instrument to provide extinction values along lines of 
sight in the F-band (see also Section 



2.5 



below) . 



2.3 The Kepler field 



The integration boundaries for Galactic longitude and lat- 
itude are determined by the location of the field-of-view of 
Kepler's 42 CCDs; each of these is split into two distinct 
regions along the channel boundary. 

For practical reasons we define integration regions 
bounded by lines of constant / and b that enclose the in- 
dividual Kepler CCD areas, and split these regions up into 
smaller boxes. The result of the volume integration for each 
of these boxes is then weighted proportional to the fraction 
of its area that overlaps with its respective CCD channel, 



using the Convexlntersect routine from O'Rourke (1995) 



The numerical volume integration for each box makes 



use of a Romberg integral following Press et al. ( 1992 ). This 
divides the volume up into at least 2 b intervals in each of 
the directions /, 6, d, and iteratively increases the number of 
intervals by factors of 2 up to a maximum of 2 10 intervals, 
until the integral changes from one iteration to the next 
by less than 0.1%. If this condition is not met once 2 10 in- 
tervals are reached the integral obtained for 2 10 intervals is 
used. We found that decreasing the cut-off to below 0.1% did 
not significantly alter the results, while markedly increased 
the computational runtime. Very few integration areas ever 
needed more than 2 5 intervals. 



2.4 Population characteristics 

The system-specific observable volumes are then multiplied 
by weighting factors determined from the distribution func- 
tions of newly-formed stars and binaries to calculate the 
total number of each type of binary and single star that are 
visible in the Kepler field. We adopt an initial mass function 
(IMF) with a slope -1.23 for O.IMq ^ Mi < O.5M , -2.2 for 



O5M ^ Mi < l.OM and -2.7 for l.OM ^ Mi ( |Kroupa 



[200l| , for both single stars and the primary star of binary 
systems. The secondary mass is selected from a flat initial 
mass ratio (IMR) distribution. The distribution of initial or- 
bital separations is assumed to be x(l°g a ) — 0.078636 for 



3 ^ a/R(7 ) ^ 10 6 . The lower limit is a simplistic cut (Hurley 



et al.|2002| , while binaries beyond the upper limit are likely 
to be disrupted by passing intergalactic stars (Heggie 1975). 



Finally we assume that 50% of all systems form as binaries. 



2.5 Bolometric corrections 

To expand the possible filter sets BiSEPS can deal with from 
just the V band to the Johnson-Cousins-Glass UBVRIJHK, 
Stromgren uvby/3, Sloan ugriz and Kepler K p bands as well 
as the custom D51 band used by the SCP for the KIC we 
have updated the bolometric corrections (BCs) from those 



given by |Flower| ( |1996| ) to those of |Girardi et al.| ( |2002 ). 
The BCs are provided as a function of T e f f and log g in the 
form of tables for different metallicities. They are based on 



1997|, and the BDdustyl999 atmosphere models ( |A1- 
lard et al.||2QQQ| ) for stars with 700K <T eff < 3900K; stars 
hotter than T e ff > 50000K are treated as black-bodies. M 
giants are treated separately by using the empirical spectra 



of Fluks et al.| ( |1994| . These stellar spectra are integrated 
over the filter response curve to derive the bolometric cor- 
rections for a star in any filter system (see |Girardi et al. 



2002, for further details). 

We perform a bi- linear interpolation over T e / / and log g 
for tabulated metallicities either side of the target metallic- 
ity and then a linear interpolation between the two metallic- 
ities. If the parameters of a star place it outside of the range 
provided by the tables in |Girardi et al.| ( |2002| we use the 
closest point inside the tables, rather than risking extrap- 
olating the data. With the BC defined for a specific T e //, 
logg and metallicity we then calculate the absolute magni- 
tude M x of a star for a specific filter x as 



M x = -2.5 log 



L 

^0 



M 



Bol,Q 



BC X 



(3) 



where L is the star's bolometric luminosity, as delivered 
by the evolutionary model, M Bo1 ^q is the Sun's bolometric 
magnitude and BC X is the bolometric correction for a star 
in filter band x. We calculate M Bol ^ in a self-consistent 
way from 



M BolB = M vB + BC vB 



(4) 



We take the Sun to hav e T eff 5777K and \ogg = 4.44, 
giving BC v q = -0.06 ( |Girardi et aL]|2002| ). Defining the 
visual apparent magnitude of the Sun to be Vq = —26.76 
implies M v q = 4.81, and hence M SoZ q =4.75 (see 
2010 for a review). 



Torres 



For the purposes of this work we follow the SCP and 
use a combination of the g and r band magnitudes to derive 
the K p magnitude, using equations 2a and 2b from |Brown| 
et al.l (120111), 



K p = 0.1 x # + 0.9 x r for (g - r) < 0.8 
Kp = 0.2 x # + 0.8 x r for (g - r) > 0.8 

2.6 Extinction 



(5) 



We obtain the extinction A\ in a given filter band from 
the extinction Ay in the visual band calculated from |Drim-| 
mel, Cabrera-Lavers &; Lopez-Corredoira ( 20031 via the re- 
lation A\/Av — A, where A is a filter dependant coefficient 
( |Girardi et aLp 008). For simplicity we follow the SCP ap- 
proach and adopt a single value of A for all stars in each 
filter band, neglecting the real dependence on T e //,logg,Z 
(IGirardi et al. 112002} . We chose the coefficients of a 5000 K, 



2002), 



\ogg = 4.O,log(Z/Z ) = 0.0 star ( |Girardi et al. 
which are given in Table [l] 



2.7 Creating a discrete sample 

The result of the above volume integration and weighting 
with initial distributions is a multi-dimensional, continuous 
(albeit binned) distribution function T that characterizes the 
content of the Kepler field-of-view at the current epoch. The 
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Filter 


Extinction coefficient (A) 


g 


1.193 


r 


0.868 


i 


0.681 


z 


0.490 


D51 


0.999 



Girardi et al. 



Table 1. Extinction coefficients for a 5000 K, \ogg 
4.0, \og{Z/ Zq) = 0.0 star, from tables provided in 



total number N of stars and binary systems is defined by the 
integral over these distributions. To obtain a synthetic KIC 
which can then be subjected to the Kepler target selection 
procedure we create a discrete synthetic sample of N stars 
from this continuous distribution. 

To this end we draw a random sample of N objects 
from the distribution function F. Each object in the sample 
is placed randomly at a location (/, b, d) inside the field-of- 
view, based on the Galactic density, extinction and absolute 
magnitude of the system. We obtained a number of different 
random samples and found no significant difference in the 
sample properties discussed below. 



3 KEPLER TARGET LIST SELECTION 

Out of the possible 450,000 stars in the Kepler field, only 
~ 150, 000 can be observed at any one time due to band- 
width limitations. Therefore Kepler uses a tailored target 
list selected according to a number of criteria designed to 
maximise the likelihood for the detection of Earth-like tran- 
sits in the star's habitable zone (Bat alha et al.|2010| . To be 
able to generate a synthetic target list from our synthetic 
sample that would reproduce the actual Kepler target list 
we created our own model of the Kepler detector system 
and target selection method. Following the procedures as 
set out in |Brown et al. (2011), Bryson et al.| ([2010) and 
Batalha et al. (2010) this entailed the following principal 



steps: (a) derive estimates of the system parameters from 
broad-band colours using the SCP routine, (b) construct a 
model of the expected S/N measured by each pixel, and then 
combine (a) and (b) to calculate the likelihood of detecting 
Earth-like transits in the star's habitable zone. 

In essence, to compile the target list the stars are ranked 
in terms of the minimum radius R p ,min of a planet that can 
still be detected securely in the absence of intrinsic stellar 
noise within the 3.5 yr mission. The radius R p , m in is ob- 
tained by requiring that the relative transit depth in flux F, 
AF/F = (Rp/R*) 2 , where i?* is the stellar radius, exceeds a 
suitable multiple of the light curve noise a tot- This becomes 



Rp,min — R* 



7. Idiot 



(6) 



(Equation 7 of Batalha et al. 2010) where r is a crowding 
metric and discussed below. Choosing to set the noise level 
to 7.1cr also implies that there would only be one statistical 
false positive signal due to random fluctuations in the light 
curve (Batalha et al. 2010} . 

We now discuss the different factors in equation [5] 



3.1 Stellar classification 

The determination of physical parameters of all KIC stars, 
including the stellar radius R* , is the remit of the SCP. This 
uses a Bayesian posterior probability estimation method to 
derive a star's T e //, logg, log Z, luminosity, mass and radius 
from its observed colours ( Brown et al.||2011 ). 

The two-step procedure is based on two sets of input 
models. Stellar atmosphere models of Castelli fc Kurucz] 
( 2004 ) were used with filter response functions to determine 
the expected colours for objects between 3500K < T e // < 
50000K, < \ogg < 5.5 and -3.5 < log (Z/Z ) < 0.5 (al- 
though not every gravity is available at every temperature) , 
while stellar evolution tracks of |Girardi et al.| ( |2000| ), assum- 
ing a constant star formation rate and solar metallicity, link 
these with the stellar mass and radius. 

Bayesian priors based on the T e //,logg distributions 
of stars observed by the Hipparcos (|Perryman et al| [l997) 
satellite, the log Z distribution from |Nordstrom et al.|p 004) 
and a Galactic distribution model from lCox &; Pil achowski 



(2000, pg482) are employed to focus the search in parame- 
ter space. The claimed advantages of a Bayesian approach 
is that a prior rules out implausible systems which e.g. a 
standard \ 2 minimisation technique might obtain. However 



shortcomings were noted in Brown et al. (2011); the metal- 



licity distribution was deemed questionable, T e // is unreli- 
able for the hottest and coolest objects and there are sys- 
tematic errors in logg for objects with g — r > 0.65. 

For each object in our synthetic sample we supply the 
calculated g r i z and D51 magnitudes as an input for the 
SCP code, to estimate the object's physical parameters in 
the same way as the SCP did for the stars in the real KICQ 
( Brown et al.|20lT ). The SCP code takes into account mag- 
nitude uncertainties, and for simplicity we selected a value 
of 0.02 mags in each band for all stars, which is the quoted 
photometric precision for objects with K p < 15, as measured 
by the SCP ( |Brown et al.|2011| ). As the KIC required exces- 
sive exposure times in the u band we excluded it from the 
fitting process by selecting a large photometric uncertainty 
for it. We also found that the J, H and K magnitudes had 
little effect on the results, and thus excluded these bands 
as well, to reduce the number of unnecessary fit parameters 



and save CPU time (see section 4.2). 

Binary stars were treated as point sources, with a mag- 
nitude in each filter band given by the sum of the fluxes of 
the two stars in that filter band. 



3.1.1 S/N determination 

Determining the expected S/N for an observation requires 
knowledge of Kepler's noise characteristics a model for which 
exists in |Bryson et al. ( 2010| , however the tools required are 
not publicly available and therefore we re-derive them here. 

To calculate the S /N expected for each synthetic system 
from its K p magnitude and the system's RA and DEC we 
require a model of Kepler's focal plane geometry (FPG), as 
described in the following. 

To place the synthetic star on the focal plane we ob- 
tained its pixel coordinates by extrapolating those of the 



http: / / www.cfa.harvard.edu /kepler /kic /kicindex.html 
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closest match in the actual Kepler data set, based purely on 
the star's RA and DEC0 Stars near the centre of the field 
have almost circular pixel response functions (PRF), while 
near the edge the PRFs are elongated towards the centre of 



the field (Br yson et al.||2010| ). Thus the PRFs are both a 
function of CCD and pixel location of the system. |Bryson| 
et al. (2010) defines a set of 5 PRFs for each CCD, four in 



the corners and one in the centre. Each of these PRFs gives 
the flux distribution over a n x n pixel array, with usually 
n — 11 but occasionally n — 15, for a star centred in the 
middle of the grid. To derive the PRF of a synthetic system 
we linearly interpolate between the 2 nearest corner PRFs 
and the central PRF. 

In this way we build up a full frame image (FFI) of 
all synthetic stars in the Kepler field down to a limiting 
magnitude of K p = 19, which was chosen as the assumed 
zodiacal light emission equates to a 19 th magnitude star on 
each pixel ( Jenkins et al.||2004 ). 

With the FFI in place we can determine the noise per 



pixel, as described in Caldwell et al (2010) and summarised 



here. For each synthetic star we calculate the PRF and sub- 
tract this from the FFI, to obtain an image of the system 
on its own as well as of the background around the system, 
including the zodiacal light. We convert the flux to electrons 



f kep = io-°- 4 (^- 12 > x f 12 



(7) 



where fi2 = 1.74 x 10 5 e s 1 is the photoelectric signal for 



a G2 V star with K p = 12 ( | Jenkins et "ajT]|2010| ). We then 
apply a smearing to each image, by summing the flux of 
each pixel in each column, multiplying by the read time of 
0.52s, dividing by the number of rows and adding this to 
each pixel. 

At this point we apply a saturation model by 'rolling 



over' the electrons which are above the well depth ( Caldwell 
et al||2010 ). This is done by performing a 50/50 split of the 



overflowing electrons, moving half of them up the pixel col- 
umn and half down the pixel column, with each subsequent 
overflow moving electrons in the same direction; until such 
a point that the number of electrons per pixel is at most the 
well depth ( |Van Cleve & Caldwell||2009| ). A charge transfer 
efficiency model is then applied with a value of 0.99993 for 
the parallel reads and 0.99995 for the serial reads (|Van Cleve| 
fe Caldwell|2009l ). 



With both images now expressed in electrons and the 
various systematics applied we calculate the S/N ratio for 
each pixel using 



S/N 



^JS + Bg + cr 



2 , + ° 2 

read 1 ■ 



(8) 



quant 



In this version of the CCD equation the signal S and back- 
ground Bg are given in electrons, while t he read noise 
(?read ~ 100e~ per read is CCD dependant (Van Cleve & 



Caldwell[ 2009 ) and the quantisation error a qua nt is given by 



/ w w \ 

(Tauant - y 12 { 2 N Mts-l J 



(9) 



( |Bryson et al.|2010| ). Here the CCD well depth W is of order 
~ lO^e - per pixel (though it is also CCD dependant, see 

2 http:/ /keplergo. arc. nasa.gov/ContributedSoftwarePyKEP.shtml 



Van Cleve fe Caldwell| ( |20Q9| )), and N bits = 14 denotes the 
number of bits the data is quantised to. This gives a qua nt ~ 
30e~ per pixel. The pixels are ranked in order of decreasing 
S/N and summed in quadrature, until the sum of the S/N 
is maximised, thus defining the optimum aperture for the 
star. This is repeated for each star with K p ^16. 

The total photometric error atot is obtained from the 
S/N value, scaled by the total number of individual inte- 
grations while the system was in transit over the envisaged 
3.5 years of the mission. This number is the product of the 
270 integrations co-added together in one long-cadence (30 
min) observation, the number N samp i e of long cadence ob- 
servations that fit in a single transit, and the number Nt r of 
transits in 3.5 years. We thus have 

N 

atot = , (10) 

S X y/270N aample N t r 

For randomly distributed inclinations of circular orbits the 
average transit duration is to7r/4, where to is the duration 
of a transit that is central across the star. Thus we have 
N samp i e = to(7r/4)/30 min. The central transit duration is 
calculated as 



to 2R* 



GAL 



(11) 



with the stellar mass M* derived from the SCP, and with 
the semi-major axis a taken at three different locations, 5i2*, 
0.5ii* and H*. The quantity H* is the characteristic distance 
of the habitable zone (H Z) for the star in con sideration and 
is given by O.95 v / L*/L ( Batalha et al.|2010 ). 

The final term required for evalua ting equation [6] is th e 
crowding metric, r, which is given by (Bat alha et aL|20T0 ) 

r = ^rV (12) 

where F* is the flux from the star in the optimal aperture 
before addition of the systematics, and Fb g is the flux from 
the background in the optimal aperture before addition of 
the systematics but after the zodiacal light has been added. 

3.2 Testing the target selection code 



With R v 



, calculated for each synthetic star in the field-of- 



view we can draw up a ranked list of stars in order of increas- 
ing Rp,min • The subset of systems with a detectable terres- 
trial sized transit in the habitable zone, i.e. R p , m in ^ 2Re 
(where Re denotes the radius of the Earth) includes a large 
number of objects, ~ 60%, that are too faint (K p > 15.0) for 
radial velocity follow up. Thus an additional prioritisation 
scheme is employed, the details of which are given in table 1 
of |Batalha et al.| ( |2010| ). In essence, the highest priority stars 
are those with R p ,min ^ 2Re in the HZ, with a magnitude 
bright enough to perform high precision radial velocity on 
(K p ^ 14), followed by those with 14 < K p < 16. Then there 
are those with detectable Earth-sized planets at a — 0.5if* 
or a = 5i?* (these deliver a larger number of transits over the 
lifetime of the mission), and finally those with R p , m in < 2Re 



in the HZ around the faintest stars. Batalha et al. (2010) di- 



vides the sample into 13 classification groups, with the 11 
highest priority groups making up the target list. 

To test the target selection code we apply it to the ac- 
tual KIC and compare the target list we obtain with the 
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Kepler Quarter 2 (Q2) data set which we use as a proxy for 
the actual Kepler target list. We chose Q2 as the catalogue 



of objects defined in Batalha et al. (2010) is not publicly 
available, and because both Quarter and Quarter 1 were 
affected by commissioning of the Kepler instrument. In Q0 
only rsj 50,000 stars were observed (Borucki et al 2011), 



while Ql had over-sized apertures ( Borucki et al|20TT ) , thus 



reduction in the number of faint, K p = 15 — 16, stars. 
We chose not to use later quarters either because after each 
quarter some targets are removed due to follow-up work, or 
added due to the guest observation program. Tenenbaum 
et al (2012) shows that in the first 12 quarters, 60% of ob- 



jects were observed for all 12 quarters. A further 15% were 
observed for 10 quarters; these predominately are systems 
falling on the CCD module that failed during quarter 4, and 
thus were only observable for 75% of the time. 

We show the magnitude distributions of our calculated 
target list and of the actual target list in Fig. [TJi. The dis- 
crepancy seen is primarily due to giant stars, here defined 
as stars with logg < 3.5, highlighted in Fig. [TJd. We could 
not attribute these differences to inadequacies in our imple- 
mentation of the target selection and SCP code and rather 
suspect that at least some differences exist because the ac- 
tual Q2 list will have some objects added or removed from 



the original list of objects as defined in Batalha et al. (2010). 



To achieve a better agreement we applied a series of 
ad-hoc corrections to our target selection criteria: 

< 2.0R E 



(i) For faint objects (14 < K p < 16), if R p 
for a — if*, we redefine the selection criterion to R p , m in ^ 
2ARe- This increases the number of faint dwarfs. 

(ii) All objects that saturate at least one pixel are in- 
cluded, if they have not already been placed into one of 
the groups in Batalha et al.| ( |2010| . This predominantly in- 
creases the number of bright giants. 

(hi) All objects with 3 < R/Rq < 10 and magnitude 
K p < 14 are included, if they have not already been placed 
into one of the groups in Batalha et al. ( 2010 ). This is purely 



ad-hoc and is designed to increase the number of bright gi- 
ants. 

With these corrections in place we consider the match be- 
tween the reproduced and actual target list satisfactory (see 
Fig. [I]:) and sufficient for the study of system properties 
presented in the following sections. 



4 RESULTS 

We now present the synthetic Kepler field population, cov- 
ering both the synthetic KIC and the synthetic target cat- 
alogue which emerges from it. We will compare the actual 
physical properties of the synthetic stars with the proper- 
ties these stars appear to have when analysed with the SCP 
method. 



4.1 Sample size 

Using the default population synthesis parameters described 
above we obtain a total of ~ 353, 000 objects in the synthetic 
KIC, compared to the ~ 416,000 objects in the real KIC. 
Increasing the Galaxy- wide SFR from the default value of 1 
star yr _1 with M > 0.8Mq to 1.2 stars yr _1 increases the 



l 




10 12 14 
Kp 

(a) 



Figure 1. Normalised magnitude distribution of stars selected 
by our procedure (blue). Left: all systems, no ad-hoc corrections. 
Middle: only giants, defined as objects with KIC logg < 3.5, no 
ad-hoc corrections. Right: only giants, but with ad-hoc correc- 
tions. In all panels the actual Q2 target catalogue is shown in 
green. 



number of systems in the synthetic sample to ~ 425, 000. 
Changing the global scale factor in this way to achieve a 
better match with the observed KIC does not affect the rel- 
ative distribution of the stars in the synthetic sample, but 
it can play a role in the target selection due to its effect 
on the background flux. For the following work we use the 
increased value of the Galaxy-wide SFR. 

4.2 Distribution in colour-colour diagram 

The distribution of KIC objects with K p ^ 16.0 in the r — i 
vs o — r colour-colour diagram is shown in Fig. [2] The left 
panel shows the synthetic KIC (Fig.[2ji) while the right panel 
displays the real KIC (Fig. ^p) In (g-r)-(r-i) colour space, 
effective temperature decreases from left to right and metal- 
licity acts essentially perpendicular to the main band of sys- 
tems, with higher metallicities having lower r — i. The fork 
at g — r ~ 1.5 is where the dwarfs (top branch) split from the 
giants (lower branch), and is located at T e // ~ 3500K. The 
distributions of the synthetic and real KIC display a reason- 
able agreement in the overall shape, however we found that 
when we applied the SCP code to the synthetic sample, the 
resulting derived physical parameters were very sensitive to 
the precise location of the stars in the colour-colour diagram. 

We therefore implemented a set of corrections to force 
a yet better agreement between the colour-colour distribu- 
tions, the result of which can be seen in the middle panel of 
Fig. [2] We applied a set of three correction terms: a linear 
offset in each filter band, a colour-dependant term, and a 
Gaussian perturbation in each filter band. The rationale for 



this approach is provided by Pinsonneault et al. (2012) who 



found a linear offset and a colour dependant difference term 
when comparing the magnitudes measured by the KIC and 
by the SDSS. The Gaussian perturbation applied to all mag- 
nitudes on the other hand acts to widen the main band in 
the colour-colour diagram, mimicking a more realistic, con- 
tinuous metallicity distribution (rather than a bimodal one) 
and the effect of photometric uncertainties. 
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Filter 


Linear offset (1) 


Colour term (c) 


a 2 


g 


-0.01819 


0.02535(#-r) 


0.01921 


r 


-0.01192 


0.05728(r - i) 


0.0 


i 


-0.02209 


0.09656(r - i) 


0.00995 


z 


-0.01313 


0.08599(i - z) 


0.02611 


D51 


-0.0222 


-0.0571(r-D51) 


0.00001 



Table 2. Correction terms applied to the calculated KIC mag- 
nitudes according to X'- = Xj + I + c + cr 2 </>, where Xj is the 
magnitude in filter bands j = g,r,i, z, D51 and <j> is a random 
number drawn from a standard normal distribution. 



We tested the corrections presented in Pinsonneault 
et al. ( 2012| to translate our SDSS based magnitudes into 
the KIC based magnitudes, however these lead to unsatis- 
factory fits in the resulting SCP derived parameters. This 
suggests that the effects of a systematic shift between SDSS 
and KIC magnitudes, the simplified metallicity distribution, 
and superimposed photometric uncertainties can not be sep- 
arated into three independent corrections that would stand 
on their own. Hence the numerical values of the corrections 
we derived here are particular to our model and would not 
be suitable for other models, but the technique we followed 
may be useful to others. 

To derive the corrections we applied a least squares min- 
imisation procedure, fitting the linear offset and colour terms 
simultaneously, using the distributions of the synthetic and 
real KIC in the following colour-magnitude diagrams: g vs 
(g — r), r vs (r — i), i vs (r — i), z vs (i — z), and D51 
vs (r — D51). For the Gaussian terms we also used a least 
squares minimisation procedure to find its width for each 
filter band, fitting in colour-colour space. We draw a ran- 
dom number from a standard normal distribution, using the 
same random number for each filter, scale it by the estimated 
width of a Gaussian centred on the magnitude derived for 
the object in question, and repeated this for each system. 
This was performed for the colour-colour distributions in 
(g — r) vs (r — i) and [z — r) vs (r — .D51), while not allow- 
ing r to vary, to derive Gaussian width coefficients for g,i,z 
and .D51. The procedure leads to the coefficients quoted in 
TableU 

Comparing the three colour-colour diagrams in Fig. [2] 
we can see that the corrections have had the desired ef- 
fect. The agreement between the corrected synthetic sample 
(middle panel) and the reference sample (right panel) has 
improved in two important aspects: there is a better match 
of the location of the peak density, and the width of the main 
band has also increased. Whilst there are still some areas of 
improvement, for example there appear to be too many ob- 
jects with g — r ^ 0.6 in the synthetic sample, which would 
translate into too many 'hot' dwarfs after target selection, 
and there is a lack of the reddest dwarfs, with r — i > 1.5, 
the bulk features of the synthetic sample are in satisfactory 
quantitative agreement with the reference sample for the 
purpose of the analysis presented below. 



4.3 Stellar parameter distribution 

Based on the corrected magnitudes we subjected all objects 
in the synthetic sample to analysis with the SCP code, and 
thus determined their 'apparent' physical parameters, as ob- 



tained by the SCP. Thus we can compare the actual physical 
properties (as determined by our population model) and the 
SCP-derived properties of synthetic KIC stars, and check if 
there are significant differences between the two. By impli- 
cation, we expect that any such differences would also be 
present in the real KIC. For this comparison we focus on 
the distribution of synthetic KIC stars in the log T e / / - log g 
diagram, as these are the most reliable parameters derived 
from the SCP. 

We first present the distribution of the actual param- 
eters of the synthetic sample (Fig. [3|, broken up by evolu- 
tionary type. For the binaries in the sample we show the 
location of the primary star (except in panel e, see below). 
The systems occupy a region with a bird-like shape with 
two prominent 'wings' and a long 'neck' towards large g 
and small T e //. The location of this region is outlined in 
black in panels a-e of the figure. The 'neck' in fact con- 
sists of two narrow, essentially parallel branches which result 
from the bimodal metallicity distribution in our population 
model. The lower branch is occupied by the lower metallic- 
ity, Z = 0.0033, main sequence (MS) stars, while the solar 
metallicity MS stars are in the upper branch. The high T e // 
'wing' is comprised of higher-mass MS stars while the other, 
lower T e ff 'wing' is comprised solely of evolved stars. 

Panel a of Fig. [3] shows the distribution of MS stars 
('dwarfs'), while panel b shows Hertzsprung gap and giant 
branch (GB) systems. In panel c we display core helium 
burning (CHe) systems, and panel d shows asymptotic and 
thermally pulsing giant branch (AGB) systems. In Fig. [3^ we 
show the distribution of the secondary components in binary 
systems; comparing with Fig.[3^, and in particular with the 
black outline, we see that in general the secondaries are more 
clustered at the low T e //, high g end of the diagram. This 
implies that they in general have a lower mass or are less 
evolved than their primary companions, reflecting the fact 
that they were the lower mass component at birth of the 
binary. 

Figure [3j shows the distribution of white dwarfs (WD) 
that are in a binary system. The synthetic sample contains 
no single WDs, but there are a very small number of binaries 
with a neutron star component (249 for the adopted input 
parameter set). We do not investigate the distribution of 
these NS systems further as our model currently treats them 
in a simplistic way. 

We now turn to the corresponding distribution of the 
synthetic sample over the 'apparent', SCP-determined val- 
ues for log T e ff and logg, shown in Fig.^] To aid the com- 
parison with the previous figure a grey-shaded area indicates 
the region the synthetic sample occupies in Fig. [3] 

Panels a-d in Fig. [2] display the same stellar subtypes 
as panels a-d of Fig. [3] We can see that the 'neck', made up 
of low-mass MS stars, is wider in Fig. [4^l than in Fig. [3^l, 
and obviously is not bimodal. The 'neck' is also at roughly 
constant g, while in the actual parameter space (Fig. [3^l) g 
increases with decreasing T e //. The 'wing' of higher-mass, 
more evolved MS systems (towards large T e //) in Fig.^ is 
shorter than its analogue in Fig. [3^l. Comparing panel b in 
Figs. [4)3 and Fig. (3Jd reveals that giant branch stars extend 
over a similar range in T e // and g, however in Fig.^3 some of 
the giants appear at low T e // along the 'neck', with a small 
gap between the bulk of the GB stars and these outliers. 
The CHe systems in Fig. [4]: are less constrained in T e ff-g 
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Figure 2. Distribution of objects inar — i vs g — r colour-colour diagram, (a) The synthetic KIC sample obtained by population models 
(b) the synthetic KIC sample after the application of magnitude corrections terms (c) the real KIC Q2 data set. 



space than in Fig. [3]: while also having a small population 
in the 'neck'. Finally, the AGB systems in Fig. [4]l appear 
mostly in the 'neck' rather than the expected low g 'wing' 
as seen in Fig. 

The MS stars, or 'dwarfs' (Fig. [3^l & are well con- 
strained by the requirement logg > 3.5. However the more 
evolved objects (Fig. ^p-d & [4|>d) are not constrained by 
logg alone. So a selection based purely by logg will be able 
to include or exclude dwarfs, but not giants. This has rami- 
fications for the bulk characteristics of the exoplanet candi- 
date systems (Gai dos &; Mann||2013 ). 



There is no analogous version for panel e of Figj3]as the 
SCP treats all objects as single stars. Instead Fig. |4p shows 
how systems with a WD component would appear after the 
SCP analysis. We find that the resulting distribution is not 
significantly different from systems without a WD, confirm- 
ing that there is no systematic way to identify WD systems 
from KIC parameters alone. This lack of difference is due to 
the fact that the WD's luminosity is at least a factor of 100 
less than its companion's luminosity, thus its flux is negligi- 
ble for the colour bands that determine the solution in T e f / 
and g. 

Finally Fig. [If shows the real KIC stars (for Q2), with 
the black contour outlining the distribution of our SCP- 
processed synthetic KIC, demonstrating satisfactory agree- 
ment in terms of overall shape and distribution. The only 
significant difference remaining is the lack of a continuous 
giant branch track towards the lowest g values. 



4.4 Post-target selection distributions 

After applying the target selection code described in Section 
|3.1| to the synthetic population of stars we can investigate 
how the target selection criteria affect the different evolu- 
tionary types of systems compared to their intrinsic distri- 
bution. 

In terms of total number of objects, the synthetic KIC 
sample was made up of 424,511 objects (208,697 single stars 
and 215,814 binary systems). This is reduced to 214,747 ob- 
jects (104,663 single and 110,084 binaries) after target se- 
lection. The real KIC data set contains 405,789 stars while 
the Q2 catalogue contains 165,434 objects. Thus the spe- 
cific synthetic sample we chose to work with has 5% more 
objects than the KIC to begin with, and 20% more objects 
after target selection compared to the Q2 dataset. The pre- 
target selection number of objects could be matched per- 
fectly by fine-tuning the underlying global Galactic SFR, 
but this would not affect the fraction of stars being selected 
as a target - ~ 50% for the synthetic vs ~ 40% for the Q2 
stars. 

We find that the binary fraction of our sample remains 
largely unaltered near the 50% level after the application 
of the target selection, thus we conclude that the target se- 
lection procedure does not select binaries differently than 
it does single stars. The synthetic sample contains slightly 
more binaries than single stars, due to binaries being inher- 
ently more luminous and thus a magnitude-limited sample 
will probe a larger volume of the Galaxy; however this dif- 
ference is negligible. 

Tables |3] and |4] show how the relative contribution from 
the different stellar and binary types change after the target 
selection. The relative fraction of MS and MS+MS objects 
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Figure 3. The distribution of the synthetic KIC sample over the \ogT e ff - \ogg plane, for different system types. T e ff and g are the 
actual physical parameters of the population model stars. In case of binaries the location of the primary is shown in panels a-d. The 
black contour outlines the region occupied by the combined synthetic sample, (a) Main-sequence (MS) stars; 'dwarfs'; (b) Hertzsprung 
gap and giant branch (hereafter GB) stars; (c) core helium burning (CHe) stars, (d) asymptotic and thermally pulsing giant branch 
(AGB) stars, (e) secondary components of a binary system, excluding systems containing a white dwarf (WD) or neutron star (NS); (f) 
WDs (these are all in binaries; there are no single WDs in the synthetic sample). 



Type 




Sinj 


^les 




Pre 


Post 


Relative difference 


MS 


73.7% 


79.6% 


+8.0% 


GB 


15.7% 


10.2% 


-35% 


CHe 


10.1% 


9.4% 


-6.9% 


AGB 


0.4% 


0.6% 


+50% 


Total number 


208,697 


104,664 


-50% 



Table 3. The relative distribution of stellar types among the 
single stars in the synthetic sample, before and after target selec- 
tion. 



increases by ~ 10%, while the fraction of systems contain- 
ing a giant decreases by ~ 40%. The original aim of the Ke- 
pler target selection was to prioritize Sun- like stars ( Batalha 



et al.|2010| , while also removing giant stars where Earth- 
sized transits are harder to detect ( Borucki et al|20lT ). Our 
analysis shows that the target selection largely succeeded in 
this goal, and our simulations allow one to quantify the bias 
this procedure introduces to the stellar sample. 

The fraction of single CHe stars is almost unchanged af- 
ter the target selection, most likely due to the fact that most 
of them are misclassified into the dwarf region of log T e / / - 
logg space. The fraction of single AGB stars increases by 
50% but is overall very small. The CHe+MS binary systems 
are also unaffected by the target selection, while binaries 



Type 




Binaries 




Pre 


Post 


Relative difference 


MS & MS 


68.1% 


74.8% 


+9.8% 


GB & MS 


11.7% 


6.3% 


-46% 


WD & MS 


8.0% 


8.5% 


+6.3% 


CHe & MS 


5.7% 


5.5% 


-3.5% 


WD & GB 


3.3% 


1.7% 


-49% 


WD & CHe 


2.0% 


1.85% 


-7.5% 


GB & GB 


0.28% 


0.17% 


-39% 


Total number 


215,814 


110,084 


-49% 



Table 4. The relative distribution of binary classes in the syn- 
thetic sample, before and after target selection. Note this list has 
been truncated, the remaining types make up < 0.2% individually 
and 1% combined. 



containing an AGB star, or CHe+GB systems, are too rare 
to draw conclusions from. 

The change in the contribution of binaries containing 
a WD depends on the nature of the WD's companion. 
WD+MS systems are essentially unaffected. The fraction 
of WD+GB systems on the other hand is almost halved, 
which is again a consequence of the fact that the giant dom- 
inates the combined flux in the griz magnitudes, and thus 
the SCP parameter estimation is not significantly altered by 
the presence of the WD. 
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Figure 4. The distribution of \ogT e ff and logg derived for the synthetic sample using the procedure in the SCP, for different stellar 
types : (a) MS stars, (b) GB stars, (c) CHe stars, (d) AGB stars and (e) systems containing a WD. Panel (f) shows the distribution of 
the Q2 data set from the SCP. In all panels the contour indicates the region covered by the combined synthetic distribution, while the 
grey-shaded area indicates the region covered by the distributions seen in Fig. [3] 



4.5 Effect of target selection 

We visualise the impact of the target selection in the log T e / / 
- logg diagram by showing the ratio of the number of sys- 
tems per (logT e //, logg) bin post- to pre-target selection, 
for three different samples. Figure [5] compares the actual Q2 
target list with the real KIC, Fig. [6] considers our synthetic 
sample in SCP parameters, and Fig. [7] in real parameters. 

As can be seen from Fig.[5]the target selection increased 
the fraction of cool dwarfs and decreased the fraction of 
the hotter dwarfs and of giants. The change in the density 
between the 'neck' and the 'wings' is due to objects in the 
'neck' having Nt r > 3 for objects in their HZ. 

The synthetic sample in SCP-derived parameters 
(Fig. [6| has a population of dwarfs in the 'neck' which is 
comparable to those in Fig. [5] The population of target- 
selected objects in the high T e ff 'wing' partially matches 
those found in Fig. [5] though we have many more ob- 
jects there. They have SCP mass ~ 1 — 2Mq and radii 
~ 1.5 — 4i?0, allowing the detection of a planet at 5Rq 
that would transit 3 times in 3.5 yrs. The giants in the low g 
'wing' are again more marked relative to the real KIC. These 
are partly made up of giants that have survived the target 
selection criteria of ( |Batalha et al.|[20lo| and partly due to 
the ad-hoc correction we applied to increase the number of 
objects with real radii 3 < Rq < 10 (these are predomi- 
nantly CHe stars). The population of giants at the lowest g 
values is due to the ad-hoc correction that adds objects that 
saturate at least one pixel. 

Figure [7] finally reveals how the synthetic sample is 
target-selected as a function of actual, physical parameters. 




"12 " 40 XS 3^ 

log T e ff [K] 

Figure 5. A comparison of the distribution of systems before and 
after the target selection, in \ogT e ff - logg space, for the actual 
Q2 star sample. 



The population of dwarfs in the 'neck' of Fig. [7] matches well 
with the population in the 'neck' of Fig. [6] The 'hot' dwarfs 
are still present in Fig. [7] Note that the large number of 
target-selected objects in the GB and AGB 'wing' are due to 
their misclassification by the SCP (objects seen in Fig.|4]3-d 
in the 'neck'). They have SCP-derived logg values of 4.2-4.6 
which implies an SCP-derived mass M — 0.5 — 0.8M©; hence 
these objects were in fact classified into our highest priority 
target group. The overpopulation of giants noted in Fig. [5] 
is less pronounced in Fig. [7| but here they reside in the CHe 
region (see Fig. |3|;) and the extreme end of the AGB region 
(seeFig.pBH). 
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Figure 6. Same as Fig. [5] but for the synthetic sample, using 
S CP-derived parameters. 
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Figure 7. Same as Fig. [5] but for the synthetic sample, using 
their correct, physical parameters. 



4.6 Comparison of SCP and physical parameters 

For a closer inspection of the differences between the real, 
physical parameters and the S CP-derived parameters we 
compare the synthetic sample and the KIC before target 
selection, as this maximises the number of objects to derive 
results from. For each object in the sample we determine the 
difference between the real and S CP-derived effective tem- 
perature, AT e ff = Teff, real — T e ff,scp, and surface gravity, 
Alogg = log g r eai — log gscp- In case of a binary system 
only the primary star is considered. We then adopt a suit- 
able binning of the logT e // -\ogg plane and determine, for 
each bin, the median values of AT e // and Alogg for all 
objects that fall into a given bin. 

Figure [§]shows AT e f / as a function of log T e f / and log g. 
The largest differences are seen in the hottest dwarfs. This 
is not unexpected as the SCP had a T e // limit of 50, 000 K 
( Brown et al.||2011| ). Figure [9] displays the distribution of 
Alogg over the logT e // - \ogg plane. The population of 
giants that are in the 'neck' and misclassified as dwarfs are 
clearly visible, having the largest Alogg. 

Tables [5] and [6] show the median values of AT e f f and 
Alogg across the whole parameter space, and the cor- 
responding standard deviation, a, binned on evolutionary 
type. MS systems (MS, MS+MS & WD+MS) have the 
largest values of AT e // ~ 500 K as well as the largest stan- 
dard deviations, which is caused by the hot dwarfs. The 
evolved systems (GB, CHe & AGB-containing systems) all 
have relatively small values, AT e // < 100 K. Although the 



Type 




Sim; 


^les 




ATeff 


[K] 


Alogg [dex] 


Median 


a 


Median a 


MS 


492 


918 


-0.23 0.39 


GB 


61 


197 


-0.42 0.96 


CHe 


74 


214 


-1.01 0.67 


AGB 


-23 


3758 


-3.01 1.14 



Table 5. The median values of the differences AT e ff and A log g 
(with the corresponding standard deviation a), between the real, 
physical parameters and the SCP-derived parameters for our syn- 
thetic single stars. 



Type 




Binaries 






ATeff 


[K] 


A log 


9 [dex] 


Median 


a 


Median a 


MS & MS 


558 


931 


-0.24 


0.41 


GB & MS 


56 


268 


-0.48 


0.72 


WD & MS 


471 


615 


-0.29 


0.38 


CHe & MS 


58 


229 


-1.06 


0.65 


WD & GB 


53 


175 


-0.47 


0.78 


WD & CHe 


-6 


708 


-3.10 


0.98 


GB & GB 


-8 


258 


-0.5 


0.82 



Table 6. Same as Table [5] but for binaries and only considering 
the primary star. 



standard deviations are still much larger than the median 
values, they are < 1/2 those of the dwarfs. 

The dwarfs have the better estimates for logg, with 
values around Alogg ~ —0.25, while evolved systems have 
Alogg values spread from —0.5 to —3.10 with standard de- 
viations approximately twice that of the dwarf systems. 

Comparing the differences between the single star sys- 
tems and their companion binary systems, there are small 
differences in the median shift, however we feel these are too 
small to be statistically significant. Therefore we conclude 
that the SCP has treated the binaries in a similar fashion to 
the single stars. 

As on average the SCP-derived g is smaller than the real 
g, the SCP will therefore also return a radius that is larger 
than the real radius, and consequently any derived planet 
radius will be larger as well . If log g for a MS star is under- 
estimated by the average value of 0.23 dex the implied stellar 
radius is too large by ~ 3% and hence for a measured tran- 
sit depth AF/F = (R p /R*) 2 the planet radiusi? p is over- 
estimated by 3%. While confirmed Kepler planets will have 
stellar radii determined by other means, usually by spec- 
troscopy (Batalha et al 201l]), most systems are too faint, 
and they are too numerous, for affordable, individual follow 
up ( Batalha et al.|2010 ), thus their radii will be uncorrected 
in the first instance and any derived planetary distributions 
skewed. 

Other authors find similar results. In the SCP paper, 



Brown et al. (2011) compared the KIC estimates for some 
35 stars with spectroscopic measurements, and noted for 
dwarfs with T e f / = 4500 — 6500K a temperature difference 
ATeff — =L200K and surface gravity difference Alogg = 
—0.4 dex. Sampling our synthetic dwarfs over this T e f / range 
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Figure 8. Distribution of the median of the difference between 
real and SCP-derived effective temperature, AT e ff = T e ff^ rea i — 
T e ff,sCP, per bin of \ogT eff - \ogg. 
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Figure 9. Distribution of the median of the difference be- 
tween real and SCP-derived surface gravity, Alogg = log g rea i — 
log gsCP, Per bin of logT e// -log#. 



we find a median value of AT e ff = +423K with a — 231K 
and Alogg = -0.14 dex , a 0.33 dex. 

|Pinsonneault et al.| ( |2012| ) modelled SDSS stars in the 
Kepler field with the infra red flux method (IRFM) and de- 
rived an average deviation of AT e // = +215 =b 100K for 
dwarf stars between 4000 K and 6500 K. Taking our sys- 
tems over a similar temperature range and only considering 
dwarfs we find a median AT e // = +413K, a — 240K, con- 



sistent with Pinsonneault et al. (2012) 



Mann et al. (2012) found from medium-resolution spec- 



tra of 382 stars, AT eff = -llOt^K for dwarfs and AT eff = 
— I50I35K for giants. Following their selection of objects 
with K p — J > 2.0 our synthetic sample gives AT e // = 
-140K, a = 116K for dwa rfs and AT eff = +44K, a = 213K 
for giants, consistent with|Mann et al7| 



2012) 



The apparent differences between Mann et al. (2012) 
and Pinsonneault et ah] ( |2012| can be explained as Mann 
et al.|(|2012| focuses on systems with T eff < 4000K while 



Pinsonneault et al. ( 2012| considers systems with T e ff > 
4000K. 



5 DISCUSSION 

We arrived at a synthetic model of the KIC and of the corre- 
sponding target-selected subsample by adapting a full stellar 
and binary star population synthesis model to the specific 
circumstances of the Kepler field and the Kepler detector. 



In order to do so we necessarily had to adopt a number of 
simplifications and ad- hoc assumptions. Here we discuss the 
potential impact these may have on our results, and what 
improvements further work should consider. 

The lack of a realistic metallicity distribution, we be- 
lieve, is the most limiting simplification of our model. The 
current design of our population synthesis procedure makes 
the inclusion of an initial metallicity that continuously varies 
with Galactic epoch computationally too expensive. The 
adopted bimodal model highlights the variation with Z, but 
does neither span the full range of metallicities implied by 
SCP fits, nor cover the bracketed range in a continuous fash- 
ion. To mimic the effect of a continuous metallicity distri- 
bution we had to introduce small, random perturbations of 
the calculated stellar magnitudes. This approach however 
cannot fully capture the effect of metallicity on evolution- 
ary timescales and system appearance — metal-poor stars 
have a shorter MS life and are less luminous than stars with 
solar metallicity, ultimately resulting in differences in their 
respective distributions in the colour-colour diagrams we set 
out to match. 

In this context we note that the SCP itself is inconsis- 
tent in its use of metallicity. In assigning a metallicity to 
a given object the SCP disregards the metallicity from the 
stellar input models ( |Castelli &; Kurucz ( 2004| and Girardi 



et al. (2000)) and exclusively relies on solar metallicity mod- 
els ( |Brown et al.|2011 ). 



To force a better agreement between the synthetic sam- 
ple and the real KIC we applied a series of colour-correction 
terms to the synthetic stars. The resulting SCP-derived pa- 
rameters of the synthetic stars turn out to be sensitive to 
these corrections, so great care has to be taken not to intro- 
duce spurious features into the synthetic distributions. We 
expect that the introduction of a realistic metallicity, whilst 
keeping a Gaussian perturbation approach to model pho- 
tometric uncertainties, would reduce these corrections to a 
term dependant on the difference between the SDSS magni- 
tudes and the KIC filter system. Such a term can then be 



independently constrained by e.g. |Pinsonneault et al. (2012 ). 

We also chose not to make use of the infrared colours 
in the SCP parameter estimation process. We would expect 
these to have an impact primarily on the low T e ff region, 
and so this may have a bearing on the number of misclassi- 
fied giants (see Fig. [3]o) . 

We introduced a series of ad-hoc corrections to the tar- 
get selection method that were designed to increase the num- 
ber of faint dwarfs and the relative fraction of giants. With 
these corrections in place we succeeded in reproducing the 
Q2 target list from the KIC. When applied to the synthetic 
sample a small systematic bias emerges that increases the 
number of CHe and AGB stars while leaving GB stars un- 
affected (see the low-temperature 'wing' in Fig.[5]vs Fig. [6}. 

For the purpose of the current study we chose to keep 
commonly used population parameters fixed. There is con- 
siderable uncertainty in some of them, and we will present 
a systematic study of their impact on the general properties 
of the synthetic sample in a separate paper. 

The total binary fraction is, somewhat arbitrarily, set at 
50%. This allows us to study the differential effect of the SCP 
and the target selection on the binary content in general. In 
reality the binary fraction is likely to be a function of stellar 
mass, reaching values near 100% for high-mass stars. The 
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choice of binary fraction becomes a more important concern 
when considering Kepler's sample of eclipsing binaries or the 
false positive transit signal. 

The high-mass end of the IMF is well constrained, and 
indeed the overall normalisation of the SFR is based on 
this. However, the shape of the IMF below ~ 0.5 Mq is 
more uncertain, and we have indeed tested if this offered 
a way to boost the number of faint objects in the target- 
selected synthetic sample. We found that varying the low- 
mass IMF within physically reasonable limits does not sig- 
nificantly change the magnitude distribution of the synthetic 
sample. 

Our choice of a flat initial mass ratio distribution (IMR) 
is consistent with what is often used in many population syn- 
thesis studies, including those by |Girardi et al.] ( |2005| (but 
note that these authors add binaries to the population of 
single stars in an ad-hoc way, while our model treats evolv- 
ing binaries self-consistently). An IMR that instead favours 
equal mass companions would increase the population of sys- 
tems with near-identical components. The SCP will assign 
the correct stellar parameters of one component to the com- 
bined object, so the net effect is that a magnitude-limited 
sample such as the KIC will include relatively more of these 
objects as they are intrinsically brighter and hence can be 
seen out to larger distances. We therefore expect that the 
binary fraction in a synthetic KIC with such an IMR will 
increase over the case of a flat IMRD. The converse is true 
for an IMR that favours unequal mass ratios where the SCP 
would also pick out the correct parameters for the primary, 
and the binary fraction in the KIC is expected to be the 
same as the underlying , intrinsic binary fraction. 

In our population model we have ignored the fact that 
binaries form with eccentric orbits and circularise on a finite, 
system-dependent timescale. Instead we kept binary orbits 
circular at all times. This seems justified as Hurley et al. 
(2002) showed that the circularisation time-scale for inter- 



acting binaries is short enough to not alter bulk properties 
of the binary population from one where the eccentricity is 
kept at 0. There is also no suggestion that the eccentric- 
ity of a binary orbit would have any effect on the system's 
detect ability in the Kepler field. 

The adopted Galactic absorption model obviously af- 
fects the make-up of the synthetic KIC, but with the smallest 



scale modelled by |Drimmel et al.|p)03) being 0.35° x 0.35° 
we deem this well suited to resolve the statistics of the 
larger Kepler field. We note that the Kepler team assumes a 



smooth, exponentially decaying absorbing disk (Brown et al. 
[20TT] ) which on average returns a larger extinction for a given 



distance than Drimmel et al. (2003}. The Kepler team quote 
that most of the target-selected stars are within 1 kpc from 
the Sun, with only ~ 50% of objects suffering a V band ex- 
tinction Ay < 0.4. In contrast, in our model only 30% of 
the target-selected stars are at < 1 kpc, while 70% are at 
< 2 kpc, which also corresponds to Ay < 0.4. 

We will address the impact of these population parame- 
ters on the synthetic Kepler field in a separate study, where 
we will attempt to extract constraints on the binary fraction 
and initial distribution functions from the observed eclips- 
ing binaries in the Kepler field, and from the statistics of 
the rare cases of binaries that show an asteroseismological 
signal from both components. 

In order to accomplish a proper treatment of a variable 



initial metallicity, or to explicitly take into account initial ec- 
centricity distributions, or additional distributions of physi- 
cal parameters characterising the stellar population such as 
stellar rotation rates, we have to alter the central concept of 
our population code. For this task we need to switch from 
sampling of the Galactic distribution function T, introduced 
in Sec. |2.7| to a Monte-Carlo sampling of the initial distri- 
bution functions, then only evolving those systems we have 
sampled. Depending on the desired application this may re- 
duce the number of systems to be evolved, and we can re- 
place the analytic fits currently used by BiSEPS to describe 
stellar evolution with a numerical, state-of-the-art ID stellar 



evolution model, like MESA (Paxton et al.||2011). 



6 CONCLUSIONS 

In this work we presented a comprehensive population syn- 
thesis model of the Kepler field, taking into account single 
and binary star evolution. We have also modelled the selec- 
tion effects inherent in the Kepler objects of interest, the 
SCP parameter estimation, Kepler's instrumental noise and 
the targeted selection of systems with the highest chance 
of detecting an Earth-like planet round a Sun-like star in 
the HZ. The main output of this procedure is a synthetic 
catalogue of systems in the Kepler field. This catalogue was 
the basis for a comparison between the real physical pa- 
rameters of the catalogue stars, as indicated by the popula- 
tion model, and the corresponding SCP-derived parameters. 
Such a comparison over the bulk of the Kepler field is only 
possible with a full theoretical population model; purely ob- 
servational tests of the SCP performance will always be lim- 
ited to a small sample of stars on the basis of bespoke spec- 
tral fitting. Using the synthetic sample we also investigated 
the effect of the target selection method on the underlying 
distributions in both SCP and real parameter space. 

We found satisfactory agreement between the synthetic 
KIC and the real KIC in colour-colour space, and between 
our target selection method and the Q2 target selection. 
Our simulations highlight a difference between the physical 
parameters of the stars in the synthetic sample and those 
derived by the SCP for the synthetic sample. We conclude 
that this systematic difference does also exist for the SCP- 
derived parameters of the objects in the real KIC. Specif- 
ically, for systems containing a MS star, the SCP-derived 
parameters deviate on average by ~ AT e // = 500 K and 
~ Alogg = —0.2 dex from the real physical parameters. 
In case of GB stars the deviation is ~ AT e // = 50 K and 
~ Alogg = —0.5 dex. This has the remarkable consequence 
that the SCP-derived stellar radii of MS stars are on aver- 
age too large by ~ 3%. If these radii are used to estimate 
the radius of any planet observed to be transiting then the 
planet radius will be 3% too large. 

After correcting for selection effects we find that these 
results are consistent with differences highlighted by other 
authors, on the basis of observational consideration of sub- 
samples. The average deviation for a given stellar type is 
observed regardless of if the star is single or in a binary. 

Our models confirm that the Kepler target selection 
procedure increases the fraction of main-sequence stars, 
from about 75% to 80%, and decreases the fraction of gi- 
ants, from 25% to 20%, relative to the KIC. In fact, our 
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population synthesis approach is the only way to quantify 
this bias; the figures demonstrate that the change is only 
moderate. 

The bias introduced into the target-selected sample is 
roughly the same for single stars and binary systems. We 
also found that the target selection has a negligible effect on 
the binary fraction, and that it does not alter the relative 
fractions of systems with different stellar evolution types, 
when compared to the single star population. 

The techniques presented here will be used in a future 
study to interpret the binary sample observed by Kepler, 
and to re-assess the Kepler false positive rate. 
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